A Software Tool for Semi-Automatic Part-of-Speech Tagging and Sentence Accentuation in Serbian Language

نویسندگان

  • Milan Sečujski
  • Vlado Delić
چکیده

This paper presents a software tool for semi-automatic part-of-speech tagging, annotation of morphological categories and accentuation of texts in Serbian language. The software tool described in this paper is used for very efficient development of tagged text corpora in Serbian language since the accuracy of automatic POS tag and morphological category assignment is 87,2%. This result was obtained by testing the algorithm on a text containing 36692 words, and has turned out to be highly dependent on the type of text. The same algorithm for automatic POS tag and morphological category assignment can be included in text-to-speech systems, enabling correct accentuation of sentences, which, in turn, leads to fairly natural prosody. Within the test mentioned above, accent type and position were determined for each word based on automatically assigned POS tag, morphology-related information, as well as certain syntax cues, and correct accentuation assignment rate of 97,2% was achieved. Programsko orodje za polavtomatsko oblikoskladenjsko označevanje in pripisovanje stavčnega poudarka v srbskem jeziku V članku je prestavljeno programje za polavtomatsko oblikoskladenjsko označevanje, pripisovanje oblikoslovnih kategorij in mesta naglasa/poudarka besedilom v srbskem jeziku. V članku predstavljeno programsko orodje je uporabljeno za zelo učinkovit razvoj označenih besedilnih korpusov srbskega jezika; natančnost pripisovanja oblikoskladenjskih oznak je namreč okrog 88 %. Rezultat je bil dosežen s preizkušanjem algoritma na besedilnem korpusu velikosti 36.692 besed, izkazal pa se je za v veliki meri odvisnega od tipa besedil. Isti algoritem za avtomatsko oblikoskladenjsko označevanje je lahko vključen tudi v sisteme pretvorbe zapisanega v govorjeno besedilo, saj omogoča pravilen pripis stavčnega poudarka, ki vodi k precej naravni prozodiji. V zgoraj omenjenem preizkusu je bila na osnovi avtomatično pripisanih oblikoskladenjskih oznak pri ugotavljanju naglasnega tipa in mesta naglasa pri posamezni besedi dosežena natančnost 97,2 %.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Design and Implementation of an Intelligent Part of Speech Generator

The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...

متن کامل

برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006